Partitioner captures data distribution at the output. A scheduler can optimize future operations based on this.
Partitioner
val partitioner: Option[Partitioner] specifies how the RDD is partitioned.
val partitioner: Option[Partitioner]
numPartitions